## Low Power Algorithm Implementation And Verification Using C++

R

### Dan Gardner Design Creation and Synthesis Division Mentor Graphics



2

## **Traditional Flow vs. Catapult Flow**



3

### **Numerical Refinement & Closed Loop Verification**



# Verification/Validation depends on application and granularity of algorithm

- Bit Error Rate
- Mean Square Error
- No overflows requirement

### Floating-point may be optional step

- Code fixed-point from the start
- Simulation speed essential for validation/verification
- Use exact bit-widths required to meet specification and save power/area



## **Micro-Architecture Optimization**



5

## **Target Optimized RTL Code Generation**



### **Multi-clock Design**

- Blocks with lower data rates run with slower clock
  - Reduction in switching power
  - **Reduction in static power by decreasing block area**



#### Technology Constraints

#### **Architectural Constraints**

## **Closed-loop Power Analysis and Optimization**

- Power consumption data annotated into Catapult using leading power analysis tools
- Micro-architecture optimizations used to balance power/area/performance
- Average 30% power savings using this flow



## **ESL Flow**



# **Additional Information**

C Men Plan

di mini

### Special Challenges with Mil/Aero DSP "High Cost of Failure"

#### Design reuse

- Very long product life cycles
- Legacy design difficult to retarget
- **—** Switching between FPGA vendors is very expensive

#### Design quality

- Achieving optimal numerical precision is difficult
- Finding optimal hardware architecture is time consuming
- Designs are typically overbuilt to guardband design goals

#### **Functional correctness**

- Mandatory for mission critical hardware
- Up to 60% of design errors come from disconnect between functional spec and RTL implementation
- RTL is too slow for system verification

#### Time to Market

- Tight milestones in government projects
- **—** Late changing requirements



# Value of Algorithmic Synthesis



## **Optimized Design Architecture**

- RTL confines your implementation to few solutions in close proximity
- Structural languages offer limited tradeoff's
  - Architectural details embedded in the source
- Restricted ANSI C
  - Limits reuse
  - Complicates coding style
  - Prevents bit-accurate modeling & numerical refinement
- Pure ANSI C++ allows exhaustive exploration of design space
  - Extremely compact
  - Object oriented hardware reuse
  - Optimization through interactive constraints
  - Optimize serial vs. parallel
  - Optimize sequential vs. pipelined



## **Interface Optimization With Interface Synthesis**



14

Patent-pending

### C++ source and testbench independent of HW interface

- Designers focus on architecture and function
- Micro-architecture tuned to the interface
  - Memories
  - **–** Busses
  - Streaming data
- Adjust bit-widths to balance performance and power

## **Memory Architecture in C++**

- Power, performance and area for many algorithms are highly dependent on memory architecture
- C++ makes various memory architectures easy to explore
  - For example, something as simple as a FIR filter can take numerous "forms"



## **System Level Capabilities**



## Catapult Verification Extension SCVerify



- SystemC Transactors
- Original C++ testbench reused to verify the RTL
- Transactors convert function calls to pin-level signal activity
- Push button solution creates Makefiles and Simulation Scripts

## **Catapult & Mathworks Partnership**

- Provides link between Catapult and MATLAB/ Simulink
  - System Simulation
  - Numerical refinement
  - HW verification
- Closes the gap between algorithm design and implementation
- Focus on high-end FPGA and ASIC

